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Abstract — In this paper we study the power-performance 
relationship of power-efficient computing from a queuing the- 
oretic perspective. We investigate the interplay of several system 
operations including processing speed, system on/off decisions, 
and server farm size. We identify that there are oftentimes "sweet 
spots" in power-efficient operations: there exist optimal combi- 
nations of processing speed and system settings that maximize 
power efficiency. For the single server case, a widely deployed 
threshold mechanism is studied. We show that there exist optimal 
processing speed and threshold value pairs that minimize the 
power consumption. This holds for the threshold mechanism with 
job batching. For the multi-server case, it is shown that there 
exist best processing speed and server farm size combinations. 

Index Terms — Power-efficient computing, queuing theory, data 
center network 



I. Introduction 

Large-scale data center networks have gained tremendous 
usage nowadays. Applications running inside such clustered 
severs include web searching, e-commerce, and compute- 
intensive applications. However, today's data centers spend a 
large amount of capital on power usage and other associated 
infrastructures. Around 40% of total operation cost is related 
to power distribution, cooling and electricity bills 1 1 1. In 2005, 
the total data center power consumption was 1% of the total 
U.S. power consumption and caused emissions as much as 
a mid-sized country such as Argentina G). Emphasizing the 
importance of these issues, we note that recently the U.S. En- 
vironment Protection Agency raised concerns to the Congress 
about the growing power consumption in data centers [3|. 

Much power consumed by data centers is wasted: servers on 
average are only 10—50% utilized ITj, |4J-|6|. Low utilization 
is epidemic to data center operations due to strict service 
level agreements on peak workload provisioning. However, 
due to the lack of "power proportionality", an idling server still 
consumes 60% of its peak power, drawn mainly in peripherals 
such as DRAM, hard disk drivers (HDDs), network interface 
card (NIC), etc. Thus, to conserve power it is preferable to 
shut down servers. When considering server farms consisting 
of multiple servers, jobs can be consolidated into a few 
servers so that the rest can be shut down. Server on/off 
decisions are often made in conjunction with processing speed 
adjustments. Dynamic voltage and frequency scaling (DVFS) 
is a conventional processing speed adjustment technique that 
changes the processor's clock frequency (and thus the speed 
of computation) according to workload conditions in order 




Fig. 1. Dynamic power management category |7| 



to reduce power consumption. Server on/off decisions (also 
known as dynamic component shut-down) and processor speed 
adjustment (also known as dynamic performance scaling) can 
be categorized in Figure [JJ (see Figure 5 in (7) for a complete 
diagram). 

Our results tie into many earlier works, both in the computer 
architecture and queuing theoretic communities. The authors 
in pi] study a power saving method that shuts down servers 
when they are in idle and characterize the power-delay tradeoff 
from a queuing theoretic perspective. However they do not 
consider the performance scaling in their theoretic analysis. 
The authors in 1 8 1 investigate power reduction possibilities for 
jobs that demand fast response. They suggest that system-wide 
coordinated power management provides a far better power- 
latency tradeoff than individual uncoordinated decisions. The 
work in |9) also make similar statement. The authors study 
power management for MapReduce tasks, suggesting that all 
nodes in a MapReduce cluster should be powered up and 
down together rather than individually in a distributed fashion. 
The authors in flO) highlight the challenges of avoiding 
negative power saving. Negative power savings occur when 
the overhead of implementing the power-savings mechanism 
exceeds the resulting savings, thus costing the system extra 
power. They suggest guard mechanisms to monitor negative 
power savings and performance degradation caused by those 
power saving routines. The impact of data center size on 
power efficiency is evaluated in fTT) . Most of the above works 
consider variants of a fixed threshold mechanism. In such 
mechanisms a server is shut down whenever it exceeds some 
idleness threshold. Stochastic on/off decisions are studied in 
12 1 and stochastic optimization methods are also used in fT3) , 
14 1 . For other related works on predictive shut-down and 



wake-up, see (7J. 

Surprisingly, although component shut-down and perfor- 



mance scaling are widely used mechanisms in power-efficient 
computing, little is known from the queuing theoretic per- 
spective, especially when component shut-down is jointly 
considered with performance scaling. The power-performance 
tradeoff in these settings is not well understood. This often 
results in suboptimal designs. We aim to study the funda- 
mental interplay between these system operations including 
processing speed, on/off decisions, and server farm size from 
a queuing theoretic point of view. Our results yield clear design 
guidance. One result demonstrates that there are sweet spots in 
power-efficient computing. These are optimal processing speed 
in combination with various other system parameter settings 
that yield the greatest power savings. Somewhat surprisings 
these results contrast to much conventional wisdom that un- 
derlies many protocols such as the "race-to-halt" mechanism. 
Race-to-halt suggests that one run the processor as fast as 
possible and then shut it down. In contrast, the sweet spots 
we identify show that it can be more power-efficient (for a 
given computational performance target) to run the processor 
more slowly for longer. To develop these results, in this paper 
we first study the interplay between fixed-threshold reactive 
power control mechanism and DVFS to identify the optimal 
operation settings. The optimal settings also appear in the 
threshold mechanism with job batching, i.e., batching certain 
amount of jobs before system wake-up. We then extend the 
concept to the multi-server case where we consider the relation 
between server farm size and processing speed. 

The rest of the paper is organized as follows. In Section [IT] 
we present the server model. In Section [III] we present the 
analysis for the single server case. The muti-server case is 
discussed in Section [IV] We conclude in Section [V] 

II. Server Model 

We model each server as a computation entity that processes 
jobs. Each server is equipped with a DVFS mechanism. DVFS 
is a conventional method widely used to trade off power 
consumption with processing speed by changing operating 
voltage and clock frequency. We assume the clock frequency 
can be scaled by a factor / E [0, 1] and the time it takes to 
process each job under DVFS is exponentially distributed with 
mean For simplicity, herein we assume the processing 

time for all jobs is independent and identically distributed. 
Setting f = 1 yields maximum processing speed and 
setting / = stops the server from processing jobs, i.e., the 
server is in the clock-gated mode. 

The dynamic power consumption of a system supporting 
DVFS is proportional to V 2 f where V is the supply voltage 
and / is the clock frequency scaling factor. The supply voltage 
is determined by frequency and can be reduced if the clock 
frequency is also reduced. This results in a cubic reduction in 
power consumption. Therefore we model the power consumed 
by the server as Pof 3 + C where Pq is the maximum power 
draw from the computing entity itself, e.g. CPU. The second 
term C is the average power drawn by peripherals such as 
DRAM, hard disk drivers (HDDs), network interface card 
(NIC), etc. This can be thought of as the "infrastructure" cost 
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Fig. 2. Threshold mechanism. 



incurred by keeping the computational unit on and ready to 
process jobs. Note that when / = 0, i.e., the server is in the 
clock-gated mode: the power consumed by the server is the 
peripheral power C. This is different from the mode that the 
server is shut-down in which case the power consumption is 
zero. When the server is shut-down, there is a wake-up penalty 
in terms of time and power. For the ease of illustration, we 
model the peripheral power C as independent of /. In practice, 
the peripheral power also depends on the system operation, 
exhibiting different values in active, idle and sleep modes |5|. 

III. Single Server Analysis 

In this section we provide our analysis for the single server 
case. We assume jobs arrive according to a Poisson process 
with arrival rate A. We study two conventional power-saving 
operations, namely the threshold mechanisms with and without 
job batching. 

A. Threshold mechanism 

We first describe the mechanism without job batching. 

Definition 1 (Threshold Mechanism). The server processes 
jobs until the queue is empty. Then it waits for a fixed amount 
of time threshold t c . If the next job arrives within this waiting 
threshold r c , the server processes the job and resumes normal 
operation. Otherwise the server shuts down in which the 
whole platform (CPU and the peripherals) is powered down 
consuming zero power. If the next job arrives after the waiting 
threshold t c ( thus after the server has powered off), the server 
takes time t s to wake up before processing the job. 

A pair of sample paths illustrating the operation of this 
mechanism is provided in Figure [2] The upper sample path 
indicate queue occupancy. The lower sample path is binary, 
indicating when the server is on and off. For the ease of 
illustration, we assume whenever the server is not shut-down, 
its power consumption is consistent over time determined by 
the frequency scaling /. Our analysis can be easily extended 
to the case where the power spent in t c and r s are different 
from the normal operation. 

Surprisingly for such a widely used mechanism, to the best 
of our knowledge, it has not been thoroughly studied from the 
queuing theoretic prospective. Indeed, it is not immediately 
clear how mean response time and power consumption are 
related under frequency scaling / and peripheral power C. 
In current implementation, the threshold value r c is chosen 
as a fixed value mostly based on operators' own experience 
| [T0[ . We investigate how the waiting threshold r c , frequency 



scaling / and wake-up latency t s jointly affect the power 
and mean response time of such systems. We study this 
via a queuing theoretic analysis. Our results reveal that it is 
important to determine these operation parameters in a joint 
fashion. Naively picking t c too large or too small may lead to 
poor power efficiency. 

The following theorem summarizes the relationship between 
mean response time E[R] and power consumption E[P}. 

Theorem 1. The mean response time and mean power con- 
sumption of a server using the threshold mechanism are given 
by: 

1 2t s + At-2 
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Proof: It is shown that the mean response time for an 
M/G/l queue with the first customer experiencing a random 
delay D is given by fl5|: 
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where c 2 is the variance of coefficient. The random delay D in 
our case is D = if < T < t c and D = t s if T > t c where 
T is the time elapse to see the first arrival after the server runs 
out of jobs. The random variable T is exponentially distributed 
with parameter A. Therefore E[D] can be calculated as: 



E[D] = 
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Similarly, E[D 2 ] = r 2 e~ XTa . Plugging them into <f3l with c 2 s = 
1 for M/M/l we obtain the mean response time 

The power expression can be derived as follows. Note that 



E[P] = (P / 3 + C)(l-/ off ), 



(5) 



where f Q s is the fraction of the time the server is off. Now 
consider a time duration L from the end of one epoch that 
the queue is empty to the end of next epoch that the queue is 
empty. Since this time duration starts with zero job and ends 
with zero job in the queue, the following equality holds: 



Ai =/'./' ( L-j-E[D] 



(6) 



Within this time duration, the server will shut down only when 
next job arrives after r c . Thus f D g can be calculated as: 

f™(t~T c )\e- xt dt 



/off = 



Plugging in f s into |5]l we obtain the power consumption Q. 

■ 

From Theorem [T] we have the following observations. First 
when r c = oo, the server never shuts down. The mean 
response time ([TJ and power consumption (|2| reduce to: 

E[R] = j^-j E[P] = (P Q f + C), (8) 



which is the mean response time and power consumption for 
an M/M/l queue with frequency scaling /. 

When t s — 0, i.e., the server incurs no delay to wake up. 
The mean response time reduces to an M/M/l case while the 
power consumption can be minimized by picking r c = 0. Thus 
we have: 
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This means that if there is no cost to wake up a server, the 
server should shut down immediately when the queue becomes 
empty. However, note that the power-delay tradeoff is not 
mono tonic: there is an optimal frequency that minimizes the 
power consumption (c.f. Figure H). In other words, it is not 
always the case that running slow (while incurring large delay) 
leads to more power savings. 

For a fixed nonzero t s , there is an optimal (r c , /) pair 
that minimizes the power consumption for a given delay 
performance. To see this, fix E[R] = R' and from ([!]) we 
obtain the relationship between / and r c : 
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Plugging it into Q, we see that the optimal frequency scaling 
/ is the one that minimizes the following: 



E[P] = (Pof + C) 
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Thus we have an optimal (r c , /) pair (c.f. Figure BV This 
suggests that one should not set r c and / independently: they 
are coupled and depend on the quality of service requirement. 

The race-to-halt mechanism is a special case of this thresh- 
old mechanism with r c = and / = 1. That is, the server 
runs as fast as it could when the queue starts to build up and 
shuts down immediately after it clears all the jobs. The mean 
response time and power consumption reduce to: 
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The power consumption (13i is a monotonically increasing 



function with respect to A. However for mean response time, 
there is a A that minimizes the delay. 



(7) B. Threshold mechanism with job batching 



In this section we extend our analysis to consider the 
threshold mechanism with job batching. 

Definition 2 (Threshold Mechanism with Job Batching). This 
mechanism is the same as the one in Definition [7] with the 
following difference. When the shut-down server sees the first 
job arrival, the server remains shut-down for some additional 
time t w before waking up. As before, wake-up takes times t s . 
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Fig. 3. Threshold mechanism with job batching. 



As we did for the basic threshold mechanism, in Figure [3] we 
provide a pair of sample paths illustrating the operation of the 
modified mechanism. The upper sample path indicate queue 
occupancy. The lower sample path is binary, indicating when 
the server is on and off. Note the additional parameter vis-a- 
vis the basic mechanism. The intuition behind this mechanism 
is that by batching more jobs at the beginning, it is less likely 
that the server will run out of jobs in the near future. This 
mechanism is the spirit in the periodic power-on and power- 
off operation in MapReduce clusters and the idea of batching 
database queries (see [9| and the references therein). However, 
it is not clear how t w affects power and mean response time 
and the relation with /, r c and r s is unknown. We derive the 
mean response time and power consumption for this threshold 
mechanism with job batching in Lemma [2] 

Lemma 2. The mean response time and power consumption 
of the threshold mechanism with job batching are 
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Proof: The proof follows from the one in Theorem [T] 
In particular, the random delay D now becomes D = if 
< T < r c and D = t s + t w if T > r c . We obtain (p) by 
solving for E[D] and E[_D 2 ] and plugging in ^ with cj= 1. 
The power consumption can also be derived in the same way 
as in Theorem [T] with / g replaced by: 



/off = 
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The rest of the proof follows from the one in Theorem [T] ■ 

Note that when t w = 0, the system reduces to the threshold 
mechanism. When t w is very large, the system waits long 
period of time before waking up: the mean response time 
thus goes unbounded and the power consumption converges 



to 



(Pof + C). 



Under a certain mean response time budget E[R] = R', 
there is an optimal triple (t c , /, t w ) that minimizes the power 
consumption. In particular, when r c = 0, the mean response 
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Fig. 4. Threshold mechanism, t 3 = r c = 0. Note that there is an optimal 
frequency /* that minimizes the power consumption. 



time and power consumption reduce to: 

1 2(r s + t w ) + \{t s + r w f 
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Further with / = 1, the threshold mechanism reduces to 
the race-to-halt mechanism with job batching. We simulate 



its mean response time (17i and power consumption (18 1 in 
Figure [6] 

C. Simulation Results 

In this section we present our simulation results for the fixed 
threshold mechanisms. We choose the simulation parameters 
in real data traces from many literatures (see (5} and the 
references therein). 

1 ) Threshold mechanism: We consider a computing facility 
with Pq = 150, /i = 1 and A = 0.1 which models low 
utilization scenario. If the wake-up cost is negligible, i.e., 
t s = 0, then from previous analysis we have r c = and the 
mean response time and power consumption reduce to (|5J. 
Figure. [4] illustrates the power-delay tradeoff for various C 
when r s — t c — 0. Notice that there is an optimal frequency 
scaling that minimizes the power consumption. The results 
suggest that running jobs at large delay (using low frequency) 
may actually consume more power to run. 

In a more realistic scenario where t s ^ 0, Figure [5] 
validates our argument that there is an optimal (r c , /) pair 
that jointly minimizes the power consumption given a target 
mean response time (c.f. (fTT|>). We set Po — 150, C — 70, 
A = 0.1, fi = 1 and t s = 10. Notice that for a given mean 
response time R', there is an optimal r c and an associated 
frequency scaling / that minimize the power consumption. 
Note also that for the mean response time achieved by the 
race-to-halt mechanism (r c = 0, / = 1), we can pick another 
(t c , /) pair that yields smaller power consumption. 
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Fig. 5. Threshold mechanism, r s = 10. Different target delay corresponds 
to different t c and / pair. 
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Fig. 6. Threshold mechanism with job batching, t s = 10 and t c = 0. 
Different target delay corresponds to different t w and / pair. Note that the 
curve with r w = is the same as the one with r c = in Figure [5] 



2) Threshold mechanism with job batching: We simulate 



the mean response time ( 17 1 and power ( 18 1 for the threshold 
mechanism with job batching. We set P$ = 150, C — 70, 
A = 0.1, /j, = 1, t s = 10 and r c = 0. The frequency scaling 
/ and batching period t w are kept as variables. The power- 
delay tradeoff is shown in Figure [6] Notice that for some 
mean response time achieved by the race-to-halt mechanism, 
we can pick another (r c , /) pair that yields smaller power 
consumption. The intuition is that to save power, one typically 
prefers smaller / over / = 1. However to maintain the 
same delay performance one needs to compensate the increase 
in delay caused by the smaller / by picking a smaller t w . 
Meanwhile, one should not decrease / too much either as 
doing so the peripheral power C will soon be the dominating 
factor. We also note that the power-delay tradeoff is monotonic 
for race-to-halt scheme: increasing r w always incurs larger 
delay and lower power consumption. 



IV. Multi-server Analysis 

In this section we extend our queuing analysis to study 
the interplay between frequency scaling and facility plant 
size, i.e., the number of servers. We study two simple multi- 
server scenarios, namely flow splitting and job splitting. We 
observe that even in such simple settings there are optimal 
operating frequency and plant size pairs that minimize the 
power consumption. 

Consider n parallel homogeneous servers with a centralized 
job dispatcher. Jobs arrive at the dispatcher according to a 
Poisson process with rate A. The job dispatcher distributes jobs 
to servers according to some rules. In this section we consider 
two simple rules: flow splitting using Bernoulli splitting and 
job splitting using fork-join. We assume all servers use the 
same operating frequency scaling /, each consuming Pof 3 +C 
amount of power. 

A. Flow splitting 

In the flow splitting case, the job dispatcher sends jobs to 
servers according to a Bernoulli splitting manner. Each server 
behaves as an M/M/l queue with Poisson arrival rate X/n. 

Lemma 3. The mean response time and power consumption 
of flow splitting multi-server system are: 

1 



E[R] = 



E[P] = n (P f + C). 



(19) 
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For any given E[R] = R', simple algebraic calculations 
show that there is an optimal frequency scaling and plant size 
pair that minimizes the power consumption. In particular, in 
large delay region R' = oo, the optimal frequency scaling / 
and plant size n are given by: 

IT x 

7i = 

This suggests that for power-efficient computation, it is not 
necessarily true that running as fast as possible or consol- 
idating jobs onto as few servers as possible offers a better 
power efficiency. This phenomenon is visualized in Figure [7] 
We conjecture that similar observations exist for round robin 
scheduling where the inter-arrival time between jobs is Erlang- 
n distributed. 

B. Job splitting 

In the job splitting case, upon a job arrival the job dis- 
patcher immediately makes n copies of the job and forks 
them in parallel to n servers. This models the queries to 
content retrieval databases where each incoming request can 
be simultaneously routed to n databases waiting for some 
of them to respond. Servers process requests in parallel and 
one queue is maintained at each server. When any k out of 
n servers respond, the rest of n — k servers abandon the 
corresponding requests and the job departs the system. Such 
system is often termed (n, k) fork-join queue fl6) in queuing 
theory literature. There is no known close form solution for 
the mean response time of the fork-join system, not even for 
(n, n) system. However several bounds exist (for example, see 
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Fig. 7. Power delay tradeoff with flow splitting. 

[17 1). For the job splitting case, working with the bounds we 
notice that there is also an optimal frequency scaling / and 
plant size n combination such that the power is minimized for 
a given delay budget. 

In both flow splitting and job splitting cases, packing jobs 
onto fewer servers requires faster processor speed to maintain a 
given delay performance thus increasing the processing power 
Pof 3 - On the other hand, provisioning more servers always 
incurs the fixed peripheral power expenditure C. 

C. Simulation Results 

We simulate the mean response time and power consump- 
tion for multi-server flow splitting case. The case for job 
splitting shares the same spirit (omitted due to page limits). 
We set P = 150, C = 10, A = 0.7 and fi = 1 while the 
frequency scaling / and the number of servers n are kept 
as variables. Simulation results are shown in Figure [7] For 
each n, we simulate different frequency scaling / to plot the 
curve. Note that for some fixed mean response time, the power 
consumption first decreases then increases with increasing n. 
Intuitively, in one extreme case where jobs can tolerate large 
delay, the system should run slowly with small amount of 



servers (c.f. (20 1). In another extreme case where jobs demand 



fast response, the system should run faster with many severs 
powered on. 

V. Conclusion and Future Work 

In this paper we present a queuing theoretic analysis of some 
widely used power-efficient operations in modern computing. 
We analytically characterize the power-delay tradeoff for the 
threshold mechanisms with and without job batching. We 
also analyze the multi-server case. For these mechanisms 
we discover that there oftentimes exist sweet spots: optimal 
combinations of processing speed and other system parameters 
that yield best power efficiency. 

There are many promising future directions. These include 
the investigation of other power-efficient mechanisms. For the 



single server case, we will consider predictive wake-up and 
shut-down routines (c.f. Figure [T}. Such proactive control 
requires some prediction tools to predict traffic and offers 
improvements in delay. For the multi-server case, we ques- 
tion the power efficiency of many conventional dispatching 
algorithms as most of them are not traditionally designed for 
power-efficient computing. We would like to understand the 
interplay and investigate the optimality between dispatching 
mechanisms and other system parameters. This will motivate 
some design guidances for power-efficient job dispatching 
routines. 
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